15 research outputs found

    Universal Approximation with Deep Narrow Networks

    Full text link
    The classical Universal Approximation Theorem holds for neural networks of arbitrary width and bounded depth. Here we consider the natural `dual' scenario for networks of bounded width and arbitrary depth. Precisely, let nn be the number of inputs neurons, mm be the number of output neurons, and let ρ\rho be any nonaffine continuous function, with a continuous nonzero derivative at some point. Then we show that the class of neural networks of arbitrary depth, width n+m+2n + m + 2, and activation function ρ\rho, is dense in C(K;Rm)C(K; \mathbb{R}^m) for K⊆RnK \subseteq \mathbb{R}^n with KK compact. This covers every activation function possible to use in practice, and also includes polynomial activation functions, which is unlike the classical version of the theorem, and provides a qualitative difference between deep narrow networks and shallow wide networks. We then consider several extensions of this result. In particular we consider nowhere differentiable activation functions, density in noncompact domains with respect to the LpL^p-norm, and how the width may be reduced to just n+m+1n + m + 1 for `most' activation functions.Comment: Accepted at COLT 202

    Signatory: differentiable computations of the signature and logsignature transforms, on both CPU and GPU

    Full text link
    Signatory is a library for calculating and performing functionality related to the signature and logsignature transforms. The focus is on machine learning, and as such includes features such as CPU parallelism, GPU support, and backpropagation. To our knowledge it is the first GPU-capable library for these operations. Signatory implements new features not available in previous libraries, such as efficient precomputation strategies. Furthermore, several novel algorithmic improvements are introduced, producing substantial real-world speedups even on the CPU without parallelism. The library operates as a Python wrapper around C++, and is compatible with the PyTorch ecosystem. It may be installed directly via \texttt{pip}. Source code, documentation, examples, benchmarks and tests may be found at \texttt{\url{https://github.com/patrick-kidger/signatory}}. The license is Apache-2.0.Comment: Published at ICLR 202

    Generalised Interpretable Shapelets for Irregular Time Series

    Get PDF
    The shapelet transform is a form of feature extraction for time series, in which a time series is described by its similarity to each of a collection of `shapelets'. However it has previously suffered from a number of limitations, such as being limited to regularly-spaced fully-observed time series, and having to choose between efficient training and interpretability. Here, we extend the method to continuous time, and in doing so handle the general case of irregularly-sampled partially-observed multivariate time series. Furthermore, we show that a simple regularisation penalty may be used to train efficiently without sacrificing interpretability. The continuous-time formulation additionally allows for learning the length of each shapelet (previously a discrete object) in a differentiable manner. Finally, we demonstrate that the measure of similarity between time series may be generalised to a learnt pseudometric. We validate our method by demonstrating its performance and interpretability on several datasets; for example we discover (purely from data) that the digits 5 and 6 may be distinguished by the chirality of their bottom loop, and that a kind of spectral gap exists in spoken audio classification

    "Hey, that's not an ODE": Faster ODE Adjoints with 12 Lines of Code

    Full text link
    Neural differential equations may be trained by backpropagating gradients via the adjoint method, which is another differential equation typically solved using an adaptive-step-size numerical differential equation solver. A proposed step is accepted if its error, \emph{relative to some norm}, is sufficiently small; else it is rejected, the step is shrunk, and the process is repeated. Here, we demonstrate that the particular structure of the adjoint equations makes the usual choices of norm (such as L2L^2) unnecessarily stringent. By replacing it with a more appropriate (semi)norm, fewer steps are unnecessarily rejected and the backpropagation is made faster. This requires only minor code modifications. Experiments on a wide range of tasks---including time series, generative modeling, and physical control---demonstrate a median improvement of 40% fewer function evaluations. On some problems we see as much as 62% fewer function evaluations, so that the overall training time is roughly halved

    Neural Controlled Differential Equations for Online Prediction Tasks

    Get PDF
    Neural controlled differential equations (Neural CDEs) are a continuous-time extension of recurrent neural networks (RNNs), achieving state-of-the-art (SOTA) performance at modelling functions of irregular time series. In order to interpret discrete data in continuous time, current implementations rely on non-causal interpolations of the data. This is fine when the whole time series is observed in advance, but means that Neural CDEs are not suitable for use in \textit{online prediction tasks}, where predictions need to be made in real-time: a major use case for recurrent networks. Here, we show how this limitation may be rectified. First, we identify several theoretical conditions that interpolation schemes for Neural CDEs should satisfy, such as boundedness and uniqueness. Second, we use these to motivate the introduction of new schemes that address these conditions, offering in particular measurability (for online prediction), and smoothness (for speed). Third, we empirically benchmark our online Neural CDE model on three continuous monitoring tasks from the MIMIC-IV medical database: we demonstrate improved performance on all tasks against ODE benchmarks, and on two of the three tasks against SOTA non-ODE benchmarks

    Neural Rough Differential Equations for Long Time Series

    Full text link
    Neural controlled differential equations (CDEs) are the continuous-time analogue of recurrent neural networks, as Neural ODEs are to residual networks, and offer a memory-efficient continuous-time way to model functions of potentially irregular time series. Existing methods for computing the forward pass of a Neural CDE involve embedding the incoming time series into path space, often via interpolation, and using evaluations of this path to drive the hidden state. Here, we use rough path theory to extend this formulation. Instead of directly embedding into path space, we instead represent the input signal over small time intervals through its \textit{log-signature}, which are statistics describing how the signal drives a CDE. This is the approach for solving \textit{rough differential equations} (RDEs), and correspondingly we describe our main contribution as the introduction of Neural RDEs. This extension has a purpose: by generalising the Neural CDE approach to a broader class of driving signals, we demonstrate particular advantages for tackling long time series. In this regime, we demonstrate efficacy on problems of length up to 17k observations and observe significant training speed-ups, improvements in model performance, and reduced memory requirements compared to existing approaches.Comment: Published at ICML 202

    Combined BIMA and OVRO observations of comet C/1999 S4 (LINEAR)

    Get PDF
    We present results from an observing campaign of the molecular content of the coma of comet C/1999 S4 (LINEAR) carried out jointly with the millimeter-arrays of the Berkeley-Illinois-Maryland Association (BIMA) and the Owens Valley Radio Observatory (OVRO). Using the BIMA array in autocorrelation (`single-dish') mode, we detected weak HCN J=1-0 emission from comet C/1999 S4 (LINEAR) at 14 +- 4 mK km/s averaged over the 143" beam. The three days over which emission was detected, 2000 July 21.9-24.2, immediately precede the reported full breakup of the nucleus of this comet. During this same period, we find an upper limit for HCN 1-0 of 144 mJy/beam km/s (203 mK km/s) in the 9"x12" synthesized beam of combined observations of BIMA and OVRO in cross-correlation (`imaging') mode. Together with reported values of HCN 1-0 emission in the 28" IRAM 30-meter beam, our data probe the spatial distribution of the HCN emission from radii of 1300 to 19,000 km. Using literature results of HCN excitation in cometary comae, we find that the relative line fluxes in the 12"x9", 28" and 143" beams are consistent with expectations for a nuclear source of HCN and expansion of the volatile gases and evaporating icy grains following a Haser model.Comment: 18 pages, 3 figures. Uses aastex. AJ in pres

    The Comet Interceptor Mission

    Get PDF
    Here we describe the novel, multi-point Comet Interceptor mission. It is dedicated to the exploration of a little-processed long-period comet, possibly entering the inner Solar System for the first time, or to encounter an interstellar object originating at another star. The objectives of the mission are to address the following questions: What are the surface composition, shape, morphology, and structure of the target object? What is the composition of the gas and dust in the coma, its connection to the nucleus, and the nature of its interaction with the solar wind? The mission was proposed to the European Space Agency in 2018, and formally adopted by the agency in June 2022, for launch in 2029 together with the Ariel mission. Comet Interceptor will take advantage of the opportunity presented by ESA’s F-Class call for fast, flexible, low-cost missions to which it was proposed. The call required a launch to a halo orbit around the Sun-Earth L2 point. The mission can take advantage of this placement to wait for the discovery of a suitable comet reachable with its minimum ΔV capability of 600 ms−1. Comet Interceptor will be unique in encountering and studying, at a nominal closest approach distance of 1000 km, a comet that represents a near-pristine sample of material from the formation of the Solar System. It will also add a capability that no previous cometary mission has had, which is to deploy two sub-probes – B1, provided by the Japanese space agency, JAXA, and B2 – that will follow different trajectories through the coma. While the main probe passes at a nominal 1000 km distance, probes B1 and B2 will follow different chords through the coma at distances of 850 km and 400 km, respectively. The result will be unique, simultaneous, spatially resolved information of the 3-dimensional properties of the target comet and its interaction with the space environment. We present the mission’s science background leading to these objectives, as well as an overview of the scientific instruments, mission design, and schedule

    On neural differential equations

    No full text
    The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art
    corecore